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D Abstract 

Consider a large Boolean network with a feed forward structure. Given a probability distri- 
bution for the inputs, can one find — possibly small — collections of input nodes that determine 
I— I the states of most other nodes in the network? To identify these nodes, a notion that quantifies 

the determinative power of an input over states in the network is needed. We argue that the 
' J mutual information (MI) between a subset of the inputs X = {Xi, of node i and the 

^ function fi (X) associated with node i quantifies the determinative power of this subset of inputs 

I— —I over node i. To study the relation of determinative power to sensitivity to perturbations, we 

relate the MI to measures of perturbations, such as the influence of a variable, in terms of in- 
" ^ equalities. The result shows that, maybe surprisingly, an input that has large influence does not 

J necessarily have large determinative power. The main tool for the analysis is Fourier analysis of 

Boolean functions. Whether a function is sensitive to perturbations or not, and which are the 
OO determinative inputs, depends on which coefficients the Fourier spectrum is concentrated on. 

We also consider unate functions which play an important role in genetic regulatory networks. 
^\ For those, a particular relation between the influence and MI is found. As an application of our 

methods, we analyze the large-scale regulatory network of E. coli numerically: We identify the 
most determinative nodes and show that a small set of those reduces the overall uncertainty 
. . of network states significantly. The network is also found to be tolerant to perturbations of its 

^ inputs, which can be seen from the Fourier spectrum of its functions. 

d 1 Introduction 



A Boolean network (BN) is a discrete dynamical system, which is for example used to study and 
model a variety of biochemical networks such as genetic regulatory networks. BNs have been 
introduced in the late 1960s by Kauffman [T| !2], who proposed to study random BNs as models 
of gene regulatory networks. Kauffman investigated their dynamical behavior and a phenomena 
called self-organization. Aside from its original purpose, BNs were also used to model (small- 
scale) genetic regulatory networks, e.g. in [U 01 [5] it was demonstrated that BNs are capable of 
reproducing the underlying biological processes (i.e., the cell-cycle) well. Boolean models are also 
used to model large-scale networks, such as the Escherichia coli regulatory network \§l which is 
analyzed in Section [6j This network is, in contrast to Kauffman's automata and the regulatory 
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networks considered in [31 HI [5], not an autonomous system, since the gene's states are determined 
by external factors. 

Concerning the analysis of BNs, measures of perturbations have received most attention. Whether 
a random Boolean network operates in the so called ordered or disordered regime is determined 
by whether a single perturbation, i.e., flipping the state of a node, is expected to spread or die 
out eventually. Kauffman [2] argues that biological networks must operate at the border of the 
ordered and disordered regime, hence they must be tolerant to perturbations to some extent. To 
our knowledge, a measure of determinative power of a node has neither been studied systematically 
nor related to measures of perturbations. Let us give an example, where a notion of determinative 
power is needed: Consider a feed-forward network where the states of the nodes are controlled by 
the states in the input layer. Now we ask, if a possibly small set of inputs suffice to determine 
most states, i.e., reduces the uncertainty about the network's states significantly. For the E. coli 
regulatory network (see Section [g]) this answers whether a small set of metabolites and other inputs 
determine most genes that account for E. colVs metabolism. The setting in this paper is as follows. 
The state of each node in the network is viewed as an independent random variable with some 
distribution. This assumption applies e.g. for networks with a tree-like topology, and is a standard 
assumption when studying the effect of perturbations. For this setting, determinative power of 
nodes and perturbation- related measures are properties of single functions, hence the analysis of 
the BN reduces to the analysis of single functions. As the main tool, we use Fourier analysis of 
Boolean functions. Fourier analytic techniques were first applied to Boolean networks by Kesseli et 
al. [3 [8]. They used spectral techniques to derive results related to Derrida plots of random BNs 
and the convergence of trajectories in random BNs. In this paper, we discuss mutual information 
as a measure of determinative power. Ribeiro et al. [9] considered the pairwise mutual information 
in time series of random Boolean networks. The setup considered here is different from the one 
studied in [9], as in [9], the functions are chosen at random, whereas here the functions are given, 
but the argument is random. Furthermore, we are interested in a measure of decisive power of 
subsets of inputs over the function's output, statistical dependencies, and connections to perturba- 
tions, whereas in ^ the pairwise mutual information in time series of random Boolean networks is 
studied. 

Contributions. We argue that the mutual information between a set of nodes and the state of a 
node is a measure of the determinative power of this set of inputs. The argument is as follows: Mu- 
tual information is a quantity that measures the mutual dependence of random variables. If a set 
of inputs to a node and the state of this node are strongly mutually dependent, then this set can be 
viewed as having large determinative power over this node. To understand determinative power and 
mutual dependencies in Boolean networks better, we systematically study mutual information of a 
sets of inputs and the state of a node. We relate the mutual information to measures of perturba- 
tions, and prove that — maybe surprisingly — a set of inputs that is highly sensitive to perturbations, 
must not have determinative power at all. Conversely, an input that has determinative power, must 
be sensitive to perturbations to some extent. These results are proven by expressing a Boolean 
function / as a multilinear polynomial over the reals, i.e., considering the Fourier expansion of /. 
Then both measures of perturbations and of determinative power can be simply computed from the 
Fourier spectrum of a Boolean function. Specifically we argue that the concentration of weight in 
the Fourier domain on sets of inputs characterizes a function in terms of tolerance to perturbations 
and determines the determinative power of nodes. Furthermore we generalize a result given by 
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Xiao and Massey [10] that gives a necessary and sufficient condition of statistical independence of 

sets of inputs in terms of the Fourier coefficients. This result can e.g. be applied to decide for which 
classes of functions the algorithm presented in [llj, that detects functional dependencies based on 
estimating mutual information, can in principle detect dependencies, or fails. Moreover we show, 
that for a unate function, any input and the function's output are statistically dependent. For 
unate functions we also prove a direct relation between the mutual information and the influence 
of a variable. The class of unate functions includes all linear threshold functions and describes 
functional dependencies in gene regulatory networks well [12]. As an application of the theoretical 
results in this paper, we show that mutual information can be used to identify the determinative 
nodes in the large-scale model of the control network of E. coli 's metabolism [6] . 

Outline. The paper is organized as follows. Boolean networks and Fourier analysis of Boolean 
functions are reviewed in Section [2j In Section |3j influence and average sensitivity as measures of 
perturbations are reviewed, and their relation to the Fourier spectrum is discussed. In Section [4] we 
study the mutual information of sets of inputs and the function's output. In Section [5j we discuss 
the class of unate functions. Section [6] contains an analysis of the large-scale E. coli regulatory 
network, using the tools and ideas developed in previous sections. 

2 Preliminaries 

In this section we review some standard facts about Boolean networks and Fourier analysis of 
Boolean functions and introduce notation. 

2.1 Boolean Networks 

A (synchronous) Boolean network (BN) can be viewed as a collection of N nodes with memory. The 
state of a node i is described by a binary state Xi{t) £ { — 1, +1} at discrete time t G N. Choosing 
the alphabet to be {—1, -1-1} rather then {0, 1} as more common in the literature on BNs, will turn 
out to be advantageous later. However, both choices are equivalent. The state of the network at 
time t can be described by the vector x(t) = {xi{t), X]\i{t)} G { — 1, -1-1}^. The network dynamic 
is defined by 



where fi : {—1, -|-1}^ — >• {—1, +1} is the Boolean function associated with node i. At time t = 0, 
an initial state x(0) = xq is chosen. In general not all arguments xi,...,xn of a function /i(x) 
need to be relevant. The variable Xj,j G {1, ■■■,N} is relevant for fi if and only if there exists at 
least one x € {— 1,-|-1}^ such that changing xj to —Xj will also change the function's value. In 
most of the Boolean network models in biology, the functions depend on a small subset of their 
arguments only. Furthermore, not every state must have a function associated with it; states can 
also be external inputs to the network. 

To investigate determinative power of nodes and effects of perturbations, we assume that each 
state is a random variable Xi which follows the distribution Pr [Xi = Xi] , Xi G { — 1, +!}• We also 
assume that the random variables Xi, ...,Xi\} are independent. This is a natural approach since to 
study determinative power and tolerance to perturbations, a probabilistic setup is needed. For a 
given network, the distribution Pr [Xi = Xi] , Xi G { — 1, -1-1} can be estimated by observing the state 
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Xi{t) sufficiently long. The assumption of independence holds for networks with tree-like structure, 
but is not feasible for networks with a non-tree like topology with strong local dependencies. In 
many relevant settings a BN has a tree-like topology, for instance the E. coli network which is 
analyzed in Section |6j For a network with arbitrary topology but few local dependencies, assuming 
independence will lead to a small modeling error. However, major results concerning the analysis of 
BNs have been obtained under the assumptions as stated above. For example an important result 
about the spread of perturbations in random Boolean networks, the annealed approximation |13j 
has been obtained by assuming that the network size N goes to infinity, which implies there are no 
local dependencies and the assumptions as stated above apply. 



2.2 Notation 

We use [n] for the set {1, 2, n}, and all sets are subsets of [n]. With YlscA mean the sum over 
all sets S that are subsets of A. Throughout this paper, we use capital letters for random variables, 
e.g., X, and lower case letters for their realizations, e.g., x. Boldface letters denote vectors, e.g., 
X is a random vector, and x its realization. For a vector x and a set A CI [n], x^ denotes the 
subvector of x corresponding to the entries indexed by A. 



2.3 Fourier Analysis of Boolean Functions 

In the following we give a short introduction to Fourier analysis of Boolean functions / : { — 1,-|-1}" — 
{— !,+!}. Let X = (Xi, Xn) be a binary, product distributed random vector, i.e., the entries 
of X are independent random variables Xi,i E [n] with distribution Pr [X^ = Xi] ,Xi £ { — 1,-|-1}. 
Then /(X) is a random variable. Throughout this paper, probabilities Pr [•] and expectations E[-] 
are with respect to the distribution of X. We denote pi = Pr [Xi = 1] , the standard deviation of Xi 
as ai = y^Var (Xi), where Var (Xi) denotes the variance of Xi, and the mean of Xi as //j = E[Xj]. 
The inner product of the functions /, g : {—1, +1}" — )• { — 1, +1} with respect to the distribution 
of X is defined as 

{f,g)^E[f{X)g{X)]= Pr [X = x] /(x^x) (2) 

xe{-i,i}" 

which induces the norm ||/|| = (/, /). An orthonormal basis with respect to the distribution 
Pr [X = x] is given by the functions 



,j,^(x) = []:^:^, sc[n]\' 



and 

$5(x) = l, 5 = 0. 

This basis was first proposed by Bahadur [14J. Thus, each function / : {—1, +1}" — )• {—1, +1} can 
be uniquely expressed as 

/(x) = /(5)%(x), (3) 

SC[n] 

where f{S) = (/, $5) are the Fourier coefficients. Note that ([s]) is a representation of the function 
/ as a multilinear polynomial, and the Fourier coefficients are the coefficients of that polynomial. 
As an example consider the AND2 function, i.e., /and2(x) = 1 if xi = X2 = 1 and /and2(x) = — 1 
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for all choices of x. Suppose the input variables Xi,X2 are uniformly distributed, i.e., = /U2 = 
and 0-1 = f72 = 1. Then the Fourier coefficients are /(0) = -1/2, /({I}) = 1/2, /({2}) = 1/2 and 
/({I, 2}) = 1/2. Hence ^ becomes 

11 1 1 

Jand{^) = ~2 ^ 2^^ ^ 2^^ ^ 2^^^^' 

As a second example consider the parity function of 2 variables, i.e., the XOR function. PARITY2 
is defined as /parity2(x) = 1 if xi = X2 = 1 or if xi = 2:2 = —1, and /parity2(x) = —1 for all 
other choices of x. It is easily seen, that written as a polynomial, that is /parity2(x) = xiX2- Let 
us conclude this section by listing properties of the Fourier transform which are used frequently 
throughout this paper. 

Decomposition: Let A C [n] and S C A. Denote S = A \ S, then 

$a(x) = $5(x)$5(x). 

Orthonormality: For A, B CI [n] 



E[$a(X)$b(X)] 



1, iiA = B 
0, otherwise. 



Parseval's identity: For / : {-1, +1}" ^ {-1, +1}, 

E[/(X)2] = ll/f = ^ f(sf = 1. 

SC[n] 



3 Influence and Average Sensitivity 

In this section we discuss measures of perturbations and their relation to the Fourier spectrum. We 
start with the influence which is a measure of the perturbation of a single input. 

Definition 1 (|15|). Define the influence of variable i on the function f as 

/,(/)= Pr [/(X)//(X©e,)], 
where the vector obtained from x by flipping its ith entry. 

By definition, the infiuence of variable i is the probability that a perturbation of input i, i.e., 
flipping input i, changes the function's output. Hence infiuence captures the effect of a single 
perturbation of input i. In [T5], the authors considered a setting were n processors where agreeing 
on some common bit as the output of a Boolean function. They used the notion of influence as a 
measure to capture the probability that a single faulty processor can alter the result of the common 
bit, if the faulty processor knows the bits announced by all other processors. This illustrates that 
influence can be viewed as the capability of input i to change the output of function /. In Boolean 
networks, usually the sum of all influences, i.e., the average sensitivity is studied. 
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Definition 2. The average sensitivity of f to the variables in set A is defined as 

The average sensitivity of f is defined as as{f) = /[„](/). 

^A^f) captures whether flipping an input, chosen uniformly at random from the set A affects the 
function's output. Most commonly all inputs are taken into account, i.e., the average sensitivity 
as{f) is studied, because it measures the effect of the perturbation of a single input. As an 
example, for the previously introduced AND2 and PARITY2 functions we have as{fpARiTY2) = 2 
and as{fAND2) = hence, the PARITY2 function is more sensitive to single perturbations than the 
AND2 function is. In the following we discuss the Fourier spectrum of functions that are tolerant 
to perturbations. To this end, the influence and average sensitivity are given in terms of Fourier 
coefficients as follows. 

Proposition 1. For any Boolean function f , 

* SQ[n]:i<^S 



Proposition 2. For any Boolean function f , 

lAif) = E E ^ (4) 

SC[n] s-.i&snA » 

Proposition [T] appears in jl6i Lem. 4.1]. Proposition [2] follows directly from Proposition [l] 
and the definition of lA{f)- From Q it becomes apparent that the average sensitivity as{f) 
is large if the Fourier weight, i.e., the squared Fourier coefficient, f{S)'^, is concentrated on the 
coefficients of high degree d = \S\. Parseval's identity implies, that the terms f{S)'^ for which 
the degree d = \S\ is small must then be small. Let's see an example which demonstrates this 
matter: Consider the AND3 function, i.e., fAND3{xi,X2,X3) = 1 if and only if xi = X2 = X3 = 1. 
The average sensitivity of the AND3 function is as{fAND3) = 0.75. Hence, fANDS is tolerant to 
perturbations. In Figure |3| we see that the spectrum of fANDS is concentrated on the coefficients of 
low degree. In contrast, consider the parity of three variables: fpARiTY3{xi,X2, X3) = X1X2X3, for 
which as{f parity) = 3. Hence, PARITY3 is very sensitive to perturbations. Observe in Figure 
[3] that the spectrum of the PARITY3 function is concentrated to a maximum on the coefficient of 
highest degree: /({I, 2, 3}) = 1. 

It can be seen from Q that for a function / to be tolerant to single perturbations, i.e., to have 
a small average sensitivity, the Fourier coefficients must be concentrated on coefficients with low 
degree. When X is uniform distributed, this is the case if / is strongly biased (i.e. if /(x) = a, 
for most inputs x, where a G { — 1, 1} is a constant). Then the coefficient f{S) of smallest degree 
d = \S\, which is /(0), is large, and the other coefficients must be small according to Parseval's 
identity. If a function / depends on few variables it follows from Q that the average sensitivity is 
small as the degree d = \S\ is small for all f{S). That implies, a function which depends on few 
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Figure 1: The Fourier spectrum of the AND3 and PARITY3 function. 

variables, is tolerant to perturbations, which is in accordance with the observations and results of 
Kauffman; he found that if a random BN operates in the ordered regime, then the functions in the 
network depend in average on few variables. 

So far, we only discussed single perturbations. However, the discussion also carries over to other 
noise models, because if a function is tolerant to single perturbations, its noise sensitivity is small. 
The noise sensitivity of a Boolean functions is defined as the probability that a function's output 
changes if each input is flipped with probability e. For uniform distributed X, e as{f) is an upper 
bound for the noise sensitivity, and for small values of e, eas{f) actually approximates the noise 
sensitivity well. For a different distribution of X and a slightly different noise model Schober |17| 
found that eas(/) still upper bounds the noise sensitivity. This result was generalized in |18| . 

4 Mutual Information and Uncertainty 

In this section, we study the mutual information MI(/(X);X^) between /(X) and a subset of 
variables X^, where X^i consists of the entries of X corresponding to the indices in the set A C 
[n]. We argue that MI(/(X);Xa) is a measure of the determinative power of "Ka over /(X). If 
MI(/(X);X^) = 0, then X^ and /(X) are statistically independent even when X^ are relevant 
variables. We will characterize functions for which this is the case in terms of Fourier coefficients. 
This provides insight for which functions sets of inputs and the function's output are statistically 
dependent. 

Before giving a formal definition of mutual information, let us start with the following example. 
Consider the PARITY2 function where the inputs Xi,X2 are uniformly distributed. Intuitively, if 
Xi has determinative power, knowledge about Xi should provide us with information about the 
random variable /pARiTY2(X). Suppose we know the value of Xi, say Xi = 1. Since /parity2(X) = 
X1X2, we have with Pr [X2 = 1] = 1/2 that Pr [/parity2 (X) = 1)] = Pr [/parity2 = l|Xi = !]• 
Hence, knowing the value of Xi does not help to predict the value of /parity2(X). Therefore Xi 
has no determinative power over /parity2(X). We indeed have MI(/parity2(X); Xi) = 0. 

Let us now define mutual information. Mutual information is the reduction of uncertainty of a 
random variable Y due to the knowledge of X; therefore we need to define a measure of uncertainty 
first which is the entropy of a random variable. As a reference for the following definitions see |19j . 
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Definition 3. The entropy H{X) of a discrete random variable X with alphabet X is defined as 

H{X) 4 - ^ Pr [X = x] log2 Pr [X = x\. 

xex 

Definition 4. The conditional entropy H(Y\X) of a pair of discrete and jointly distributed random 
variables (y, X) is defined as 

H{Y\X) 4 ^ Pr [X = x] H{Y\X = x). 

Definition 5. The mutual information MI(y; X) is the reduction of uncertainty of the random 
variable Y due to knowledge of X , 

Ml{Y; X) = H{Y) - H{Y\X). 

For a binary random variable X with p = Pr [X = 1], we have H{X) = h{p), where h{p) is the 
binary entropy function, defined as 

h{p) = -p\og2P-{l-p)\og2{l-p). (5) 

Mutual information is a measure of determinative power because of the following reasons. Consider 
a single variable Xi of the argument X: If knowledge of Xi reduces the uncertainty of f(X.), then 
Xi determines the state of /(X) to some extent, because then knowledge about the state of Xi helps 
in predicting /(X). Another property of mutual information, which we expect from a measure of 
determinative power is that not all variables can have large determinative power because of the 
inequality 

n 

MI(/(X); Xi) < MI(/(X); X) < 1, (6) 

1=1 

which follows from the chain rule of mutual information (as a reference see |19j ) and independence 
of the variables Xi,i £ [n]. Hence, if MI(/(X);Xj) is large, i.e., close to 1, we can be sure that 
Xi has some determinative power over /(X), since Q implies that MI(/(X);Xj) for j i must 
be small then. That is one reason why influence is not a measure of determinative power: Each 
value can have large influence, consider e.g. the parity function, where each input has influence 1. 
If variable i has large influence, this just implies that input i has power to change the output, but 
not to determine it. 

In the following we represent the mutual information in terms of the Fourier coefficients and then 
relate the mutual information to the influence. We first consider single variables, i.e., MI(/(X); Xj), 
and then the general case of a set of variables, i.e., MI(/(X); X^). 

4.1 Single Variables 

MI(/(X);Xj) has previously been studied as information gain as a measure of "goodness" for split 
variables in greedy tree learners |i20j , and also appears under the name informativeness as a measure 



8 




MI(/(X);X, 



Figure 2: MI(/(X); Xi) as a function of /({i}) and /(0) for = 0.3. 



of voting power [21]. First, we discuss MI(/(X);Xj) and its relation to the influence. We start by 
expressing the mutual information in terms of Fourier coefficients as 



MI(/(X);X,)=/i( ^(l + /(0)) 



■M(^(i + /(0) + /(W)^ 



(7) 



(l-p,)/.(Ul + /(0) + /(W)^-i^ 



which follows from Theorem [5| given in Section 4.3 The mutual information MI(/(X);Xj) just 
depends on /({i}), /(0) and pi. In contrast, the influence Ii{f) is a function of the coefficients 
{f{S): S e [n],i e S]. In Figure [i] we depict MI(/(X);Xi) for pi = 0.3 as a function of f{{i]) 
and /(0). It can be seen that MI(/(X); Xj) = 0, i.e., /(X) and Xi are statistically independent, 
if and only if /({i}) = 0. That can be formalized as follows. MI(/(X); Xj) is convex in /({i}). 
This can be proven by taking the second derivative of Q, and observing that it is larger than zero 
for all pairs of values {f{^)J{{i})) for which MI(/(X);Xi) is defined. Next, from we see that 
MI(/(X);Xi) = if f{{i}) = 0; hence it follows that MI(/(X);Xi) = if and only if f{{i}) = 0, 
which proves the following corollary. 

Corollary 1. Let f he a Boolean function and X he product distrihuted. Xi and /(X) are statis- 
tically independent if and only if f{{i}) = 0. 

Corollary [l] also follows from a more general result. Theorem [3| which is presented later. 
We already verified Corollary [T] in the introductory example: For the PARITY2 function we rec- 
ognized 

MI(/pARiTY2(X);Xi) = and /({I}) = 0. From Figure § it can be seen that the larger |/({i})|, 
the larger MI(/(X);Xj) becomes. Formally, it follows from the convexity and Corollary [l| that 
MI(/(X);Xj) is increasing in |/({i})|. Hence Xi has large determinative power, i.e., MI(/(X);Xj) 
is large, if and only if |/({i})| is large (i.e., close to one). |/({i})| is trivially maximized for the 
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dictatorship function, i.e., for /(x) = Xi, or its negation, i.e., /(x) = — Xj. The output /(X) of the 
dictatorship function is fully determined by Xj. 

Let us now now relate the mutual information to the influence: 

Corollary 2. For any Boolean function f , for any product distributed X, 

Hf) > \ (MI(/(X);X,) - M/(Var (/(X)))) , 

with ^{x) = -X. 

Proof. Follows from Theorem [2] given in Section 4.2 □ 



The term ^'(Var (/(X))) should be understood as an error term which satisfies < ^'(Var (/(X))) < 
0.12 and which is close to zero for settings of interest, as the following argument explains. Corollary 
[2] is not of interest when Var(/(X)) is small, since then /(X) is close to a constant function (i.e., 
close to /(X) = 1 or /(X) = —1), and and MI(/(X);Xj) must both be small, i.e., close 

to zero. Hence Corollary [2] is of interest when Var(/(X)) is large, i.e., close to 1, and then the 
term ^'(Var (/(X))) is smah, e.g., for Var(/(X)) > 0.8, ^'(Var(/(X)) < 0.05. Corollary [2] gives 
a lower bound on the influence of a variable by the mutual information of that variable. Hence, 
if MI(/(X);Xj) is large, then /«(/) is also large. That proves the intuitive idea that if an input 
determines /(X) to some extent, this input also has to be sensitive to errors. Conversely, as men- 
tioned previously, an input i can have large influence and still MI(/(X); Xi) = 0: For the PARITY2 
function we have Ii{f) = 1 and MI(/(X);Xi) = 0. 

Interestingly, the influence also has an information theoretic interpretation. The following 
theorem generalizes Theorem 1 in |21j . 

Theorem 1. For any Boolean function f , for any product distributed X, 

ff(/(X)|XM\w) 
'^^^ ~ H{X.) 



Proof. See Appendix [B] For uniform distributed X, a proof appears in ^21j. □ 

Proposition [T] shows that the influence of a variable is a measure for the uncertainty of the 
function's output that remains if all variables except variable i are set. 

4.2 A Set of Variables 

In this part we discusses MI(/(X); Xyi) where A is an arbitrary subset of [n]. 

Let us flrst consider the case where A = [n]. Then Xyi = X and MI(/(X); X^) = H{f(X.)). It 
follows that MI(/(X); X^) is maximized for Pr [/(X) = 1] = 1/2, i.e., if the variance of /(X) is 1. 
This is the case if /(0) = 0. In general, the closer to zero /(0) is, the larger the mutual information 
between a function's output and all its inputs. 

From now on, let A be an arbitrary subset of [n]. In the following we relate the average 
sensitivity of the inputs indexed by the set A to the mutual information. 
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Theorem 2. For any Boolean function f, for any product distributed X, 

lAif) > min (^) (MI(/(X); X^) - ^(Var (/(X)))) (8) 

with 

^{x)^{xf'^<^'^-x. (9) 

Proof. See Appendix [C) □ 

As explained previously, the term ^(Var (/(X))) should be understood as an error term. Theo- 
rem[2]shows that a large value of MI(/(X); X^) implies that / must be sensitive to perturbations of 
the variables X^i. Moreover, if /^(/) is small, i.e., if / is tolerant to perturbations of the variables 
X^, then MI(/(X); Xyi) must be small, i.e., the variables Xyi do not have large determinative 
power. For the case that A = [n], Theorem [2] states that the average sensitivity is lower-bounded 
by MI(/(X);X) minus some small term. 

The following theorem characterizes statistical independence of /(X) and a set of its arguments 
X^ in terms of the Fourier coefficients. This result generalizes a theorem derived by Xiao and 
Massey [TO] from uniform to product distributed X. 

Theorem 3. Let A C [n] he fixed, f he a Boolean function and X he product distributed. Then 
/(X) and the inputs 'X.a = {^i '■ i G A} are statistically independent if and only if 

f{S) = for all S <Z A\ 0. 



Proof. See Appendix |dJ For uniform distributed X, i.e., Pr [Xj = 1] = 1/2 yi G [n], Theorem [s] 
has been derived by Xiao and Massey [10]. The proof provided here follows from an application 
of Lemma [T] in Appendi:x[A| whereas the proof for uniform distributed X given in \lOl relies on the 
Xiao-Massey lemma. □ 

Theorem [3] shows that a function and small sets of its inputs are statistically independent, if the 
spectrum is concentrated on the coefficients of high degree d = \S\. The most prominent example is 
the parity function of n variables, i.e., /parityn(x) = xiX2...Xn: For uniformly distributed X, each 
subset of n — 1 or fewer arguments and /parityn(X) are statistically independent. If a function 
is concentrated on the coefficients of low degree d = \S\, which is the case for functions that 
are tolerant to perturbations, then small sets of inputs and the function's output are statistically 
dependent. 

Theorem [3] is also of interest to analyze algorithms which detect functional dependencies in a 
BN based on estimating the mutual information from observations of the network's states, such 
as the algorithm presented in [11]. For those. Theorem |3] allows to predict for which classes of 
functions such an algorithm can succeed and for which it will fail. Theorem [3] also shows that in a 
BN model of a genetic regulatory network, a functional dependency between a gene and a regulator 
cannot be detected based on statistical dependence of a regulator Xi and a gene's state /j(X), 
unless the model restricts the regulatory functions to those for which /({«}) > holds for each 
function's input. 
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4.3 Fourier CoefRcients and Uncertainty 

Let us finally relate the conditional entropy -?/(/(X)|X^) to a concentration of the Fourier weight 
on the coefficients {S : S CI A} where A CI [n]. 

Theorem 4. Let f be a Boolean function, let X be product distributed and let X^i = {Xi : i £ A} 
be a fixed set of arguments, where A Q [n]. Then 

1 - E /(^)' ^ H{f{x)\XA) > 1 - E /(^)'- 

SCA J SCA 



Proof. See Appendix [Ej □ 

Theorem 4 shows that i/(/(X)|Xyi) can be approximated with 1 — X^sca/C*^)^- further 
shows that -f/(/(X)|X^) is small, if the Fourier weight is concentrated on the variables in the set 
A, i.e., if J2scA fi'^)'^ is close to one. The conditional entropy if(/(X)|X/i) can also be directly 
expressed in terms of Fourier coefficients: 

Theorem 5. Let f be a Boolean function, let X be product distributed and let Xyi = {Xi : i £ A} 
be a fixed set of arguments, where A Q [n]. Then 



H{f{X)\XA)=E 



1 + E /(5)$5(Xa) 



SCA 



where h{-) is the binary entropy function as defined in ([s]). 

Proof. See Appendix |Fj For the special case of uniform distributed X, a proof appears in [22], in 
the the context of designing S-Boxes. □ 

Theorem [5] shows that the conditional entropy //(/(X)|X^) is a function of the coefficients 
{/(5) : S A} only. In contrast the average sensitivity depends on all coefficients {f{S) : \SriA\ > 
0}. 



5 Unate Functions 

In this section we discuss unate, i.e., locally monotone functions. 

Definition 6. A Boolean function f is said to be unate in xi if for each x = [xi, x„) G {—1, +1}" 
and for some fixed at G { — 1, +1}; fi^i, Xi = — a^, Xn) < f{xi, ...,Xi = Oi, x„) holds, f is 
said to be unate, if f is unate in each variable Xi, i £ [n]. 

Each linear threshold function is unate and also most, if not all, regulatory interactions in a 
biological network are considered to be unate. That can be deduced from [12t ,23j. and the basic 
argument is the following: If an element acts either as a repressor or an activator for some gene, 
but never as both, then the function determining the gene's state is unate by definition. And this 
is an reasonable assumption for regulatory interactions [121 123 ]. For unate functions, the following 
interesting property holds. 
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Proposition 3. Let f : {— 1,+1}" — ^ {— 1,+1} be unate. Then 

f{{i}) = aiaJiif), \/i e [n] 
where Oj S {—1, +1} is the parameter as given in Definition^ 

Proof. Goes along the same lines as the proof for monotone fmictions in Lem. 4.5]. 



(10) 



□ 



Note that conversely, if (10) holds for each E [n], / must not be unate. Inserting (10) into 
yields 



MI(/(X);Xi) = /i( -(l + /(0)) 



(11) 



■E 



l + fm+aiadiif) 



X,, 



where the expectation in (11) is over Xi. Based on (11), the discussion from Section 4.1 applies 
by using f{{i}) and aiaili{f) synonymously. Hence, for unate functions, the mutual information 
MI(/;Xj) is increasing in the influence Ii{f)- Moreover if / is unate, and Xj is a relevant variable, 
i.e., a variable on which the functions actually depends on, then |/({i})| > 0. From this fact and 
the same arguments as given in Section 4.1 follows: 

Theorem 6. Let f : { — 1,+1}" — )• { — 1,+1} be unate. Lf and only if Xi is a relevant variable, then 
MI(/(X);X,) /O. 

In a Boolean model of a biological regulatory network, this implies that if the functions in the 
network are unate, then a regulator and the target gene must be statistically dependent. 



6 E. coll Regulatory Network 

In [6], the authors presented a complex computational model of the E. coli transcriptional regulatory 
network that controls central parts of the E. coli metabolism. The network consists of 798 nodes 
and 1160 edges. 636 of the nodes represent genes and of the remaining 162 ones, most (103) are 
external metabolites. The rest are stimuli and other state variables such as internal metabolites. 
The network has a layered feed-forward structure, i.e., no feedback-loops exists. The 133 elements 
in the first layer can be viewed as the inputs of the system and the elements in the following 7 
layers are interacting genes representing the internal state of the system. Our investigations showed 
that all functions are unate. This is a special property of the network, since if functions are chosen 
uniformly at random, it is unlikely to sample a unate function, especially if the degree n is large. 
Hence, the properties which we derived in Section [5] apply. 

6.1 Determinative Nodes in the E. coli Network 

In this section, we identify the input-nodes that have large determinative power (we will define 
what that means shortly), and show that a small number thereof reduces the uncertainty about 
the network's state significantly. More specifically, we show that in average the entropy of the 
controlled states, conditioned on a small set of determinative input nodes, is small. That implies 
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that the conditional probabiHty of the controlled nodes to be either one or minus one, is close to 
one in average. We denote by X = {Xi, Xn},n = 162 the set of inputs of the feed-forward 
network and assume they are independent and uniformly distributed. The remaining variables are 
denoted by Y = {Yi, ...,Ym},m = 636, and are given as a function of the input nodes, and also 
of other networks states, i.e., Yi = /j'(X,Y). For the analysis, the distributions of the random 
variables Yi,...,Ym need to be computed, since some of these variables are arguments to other 
functions. This can be circumvented by defining a collapsed network, i.e., a network where each 
state in the network is given as a function of the input nodes, i.e., Yi = /i(X). The collapsed 
network is obtained by inserting consecutively functions into each other, until each function just 
depends on the states of the nodes in the input layer, i.e., on X. After having obtained each state 
Yi as a function of the input X, the determinative nodes can be identified. 

As argued in Section |4| MI(/i(X); Xj) is a measure of the determinative power of Xj over 
Yi = /i(X). This motivates to define the determinative power of input Xj over the states in the 
network as: 

m 

D{j)^J2^mxy,x,). 

i=l 

Note that a small value oiD{j) just implies that the state Xj alone does not have large determinative 
power over states in the network. However, J^I^i j -^j) ^fc) can be large for some j. A; S [n], 

even though D{j) and D{k) are both small. We computed D{j) for each input variable and found 
that D{j) is large just for some inputs, such as the variables o2-xt (36.9 bit), leu-Lxt (20.9 bit) 
and glc-d-xt (19.3 bit), (here we adopted the names from the original dataset), but is small for 
most other variables. It is natural to ask, whether this can be explained from nodes with large 
values of D{j) having many outgoing edges, while most other nodes do not, i.e., the out-degree 
of the input nodes is power law distributed. Indeed, in the E. coli network, the out-degree and 
D{j) are correlated to some extent. However, in theory, having an large out-degree does not 
necessarily result in large values of D(j). That is also what we observe in practice, e.g., the state 
variable glc-d-xt has 99 outgoing edges, but D(glc-d_xt) = 19.3 hit, whereas variable o2-xt has out 
degree 72, but D(o2-xt) = 36.9 hit. For the following, denote by r a permutation on [n] such that 
Z)(Xt-(i)) > L>(X^(2)) ^ ••• ^ D{X^(^n))i T orders the inputs nodes in descending order in their 
determinative power. 

After having identified the most determinative nodes, we want to see weather knowledge about 
a small set of those reduces the entropy of the networks states significantly, i.e., we are interested 
in if(Y|X^(x), Xt-(;)) as a function of /. The quantity -ff(Y|X^(i), X^(j)) has an interesting 
interpretation which arises as a consequence of the so called asymptotic equipartition property, see 
|19j : Consider a sequence yi,...,yfc of k samples of the random variable Y. For e > and k 



(k) 

sufficiently large, there exists a set A\ of typical sequences yi, ...,yA; such that 



and 



|^(fc)| < 2'=(^(Y)+^) 

Y G A'f^ 
le set Ae 

ing from Y are likely to fall in a set of size determined by the uncertainty of Y. In this sense 



Pr 



> 1 



(fc) (k) 

where \Ae\ denotes the cardinality of the set Al . Namely, the sequences obtained by draw- 
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Figure 3: The upper bound A{1) on i?(Y|X^(i), ^^(i)) as a function of I. 

H{Y\X^(^i-^, can be interpreted as a measure of the size of a subset of the overall state 

space where the system is likely to be found, given knowledge about the states 

For a large network, H(Y\X^(^i^, is hard to compute directly, therefore we upper bound 
this quantity. The random variable Y is a function of X, hence its entropy cannot be larger than 
the entropy of X and we obtain the simple upper bound 

i/(Y|X,(i),...,X,(,)) < H{Y) < H{X) = lQ2bit. 

We consider the upper bound 

m 

i?(Y|X,(i), < A{1) ^ ^(5"*l^r(l), - , ^r(o)- 

i=l 

which follows from the chain rule for entropy |19j . The upper bound A(l) on (Y|X,-(i), X^j-;)) is 
depicted in Figure [3] as a function of I. We can see from Figure [3] that knowledge of the states of the 
most determinative nodes reduces the uncertainty about the network's states significantly. Actually, 
this bound is not very tight, hence we can even expect f/'(Y|X^(i), to lie significantly 
bellow this upper bound. This experiment demonstrates that the determinative power is unequally 
distributed among the input nodes in the E. coli network. Another interpretation of the bound 
A{1) is the following. When A{1) is small, on average H{Yi\X^(^i-^, must be small and hence 
Pr [Yi = l|X^(i), is close to one or to zero in average. 

6.2 Tolerance to Perturbations 

Finally, we discuss the average sensitivity of individual functions in the E. coli network. In Sec- 
tion |3j we found that the average sensitivity is small if the Fourier spectrum is concentrated on 
the coefficients of low degrees, and that there are basically two types of functions for which this 
is the case: Functions for which the bias is high, i.e., the probability Pr [/(X) = 1] is close to 
one or zero, and functions that depend just on a few variables. Figure |4] shows pairs of values 
(as(/), Pr[/(X) = 1]) for each function in the E. coli network, assuming that Xi,...,Xn are inde- 
pendent and Pr [Xi = 1] = 1/2. Observe that the functions with high in-degree K (i.e., number of 
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relevant input variables) could have a large average sensitivity, but as{f) is small, because these 
functions are highly biased. Hence we can see from Figure |4] that the average sensitivity of the 
functions is small — indeed close to the lower bound on the average sensitivity — because a function 
either depends on few coordinates or is highly biased. For other distributions on the inputs, i.e., 
other values of p = Pr [Xi = 1] , Vi € [n] , Figure |4] looks similar. 



1 

















L. Bound 














A 


K = 2 






• 








• 


K = 3 
















K = 4 










• 




+ 


K = 5 














o 


K = 6 










+ 




* 


K = 7 














* 


K = 8 








A 


+ 




e 


K = 9 








X 








K = 10 






• 
























□ 


K = 11 


















- we*** *"•" 



















0.5 




1 


1.5 



as{f) 

Figure 4: Pairs of values (a,s(/), Pr[/(X) = 1]) of each function in the E. coli network, for different 
in-degrees K and uniform distributed X. Moreover the lower bound on the average sensitivity 
as{f) is plotted (known as Poincare's inequality). 



7 Conclusion 

In a Boolean network, properties such as tolerance to perturbations and statistical dependencies 
between nodes are properties of single functions. Hence, we concentrated on the analysis of single 
functions and considered a function /(X) as a random variable, where X is a product distributed 
argument. For this setting, we used Fourier analysis of Boolean functions to investigate the mutual 
information (MI) MI(/(X); X^) between a function /(X) and a set of variables X^. We argued that 
MI(/(X); X^) is a measure of the determinative power of X^. For the mutual information between 
a single variable Xi and /(X), we found that MI(/(X);Xj) just depends on (/({i}), /(0)), and is 
strictly increasing in f{{i})- Furthermore if MI(/(X); Xi) is large, the influence Ii{f) must be large. 
Moreover, we proved inequalities that relate MI(/(X); X^) to measures of perturbations, and gave 
the necessary and sufficient conditions for statistical independence of /(X) and X^. MI(/(X); X^) 
just depends on the coefficients {f{S) : S C A}, whereas the average sensitivity of the variables in 
A, i.e., a measure of perturbation of the variables in A, depend on {f{S) : \S Ci A\ > 0}, which is a 
fundamental difference. However, for the class of unate functions, which are especially interesting 
for biological networks, we found a direct relation between MI and influence. For unate functions, 
MI(/(X);Xi) > for each relevant input. Therefore, in a biological network such as the E. coli 
regulatory network, where each function is unate, a gene and its regulator must be statistically 
dependent. As an application of our methods, we analyzed the large-scale regulatory network of 
E. coli. We identified the most determinative nodes, and demonstrated that knowledge about the 
states of a small subset of them reduces the uncertainty about the states of all the other nodes 
significantly. Furthermore we showed that the E. coli network is tolerant to perturbations of its 
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inputs, and that this can be explained by the Fourier spectrum of the functions in the network 
being concentrated on the coefficients of low degree. 
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Appendices 

A Lemma [1] 

For the proof of Theorem [3] and [5] we will need the following lemma. 

Lemma 1. Let f be a Boolean function, let X be product distributed and let A C [n] and some 
/ixed xa G {— 1, +1}'"^' be given. Then 

E[/(X)|X^ = xa] = E fiS)^sM. (12) 

SCA 



Proof. Inserting the Fourier expansion of /(X) given by ([s]) in the left-hand side of ( 12 ) and utilizing 
the linearity of conditional expectation yields 

E[/(X)|Xa = xa] = Yl /(^) IE[a>5(X)|XA = x^]. 

SC[n] 

For S^A, 

E[%(X)|Xa = xa] = $5(xa). 

Conversely for S A 

E[$5(X)|X^ = xa] = 0. 

To see this assume without loss of generality that S = AL){j} and j ^ A. Using the decomposition 



property of the basis function as given in Section |2.3 

E[$s(X)|Xa = xa] =E 



n^«(x)ix^=x^ 



lies 



niE[<J>{i}(X)|XA = XA] 



ies 

which is equal to zero as 



E[<I>|,}(X)|Xa = xa] = E[<I>|,|(X)] = 0. 

□ 
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B Proof of Theorem [I] 

For notational convenience, let A = [n] \ {i}. By definition of the conditional entropy, 

H(/(X)|Xa)= Pr[X^ = x^]F(/(X)|X^ = x^) 

XAe{-i,i}l-4l 

Pr[XA = x^]/i(Pr[/(X) = l|X^=XA]) (13) 

XAe{-i,i}i^l 

where h{-) is the binary entropy function as defined in ([5]). Observe that 

/i(Pr [/(X) = 1|X^ = xa]) = h{Pi [X, = 1]) 

if 

f{Xi = Xl,...,Xi = l,...,Xn = Xn) 
/ f{Xi = Xi, ...,Xi = -1, ...,Xn = Xn) 

and 

h{Pi [/(X) = 1|Xa = xa]) = 

otherwise. Hence (flSl) becomes 



XAe{-i,i}l^l 

where x® Cj is the vector obtained from x by fiipping its ith entry, and Theorem [T] follows by using 
the definition of the influence. 

C Proof of Theorem [2] 

According to Proposition [2j 

lAif) = E fisf E ^ 



5C[n,]\0 ^ * ^ 



SC[n]\9 

SCA\0 

Next, we rewrite the lower bound on i7(/(X)|XA) given by Theorem |4] as 



(14) 



E f{Sf > 1 - fm' - ^(/(X)|Xa). (15) 

SCA\0 
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By adding H{f{X.)) — H{f{X.)) on the right hand side of (15), and using the definition of MI, (15) 
becomes 

/(5)2>MI(/(X);X^) - H{f{X) + 1 - /(0)2. (16) 

SCA\(D 

With Var (/(X)) = 1 - /(0)2 and by using the inequality H{f{X)) < (Var (/(X)))!/'^^, given in 



Thm 1.2], (16) becomes 



^ /(5)2>MI(/(X);XA)-^(Var(/(X))), (17) 

5CA\0 



with ^{■) as defined in ([o]). Finally, Theorem [2] follows by combining (14) and (17). 

D Proof of Theorem [3] 

By definition, /(X) and X^^ are statistically independent if and only if for all S +1}'"^' 

Pr [/(X) = 1|Xa = x^] = Pr [/(X) = 1] . (18) 

With 

Pr [/(X) = 1|X^ = x^] = ^ + \E[f{X)\XA = ^a] 
and application of Lemma [T] given in Appendix |Aj ( |18[ ) becomes 

SCA 

^ /(^)%(xa) = 0. (19) 

5CA\0 

It follows from the Fourier expansion that ^ holds for ah x^ G {-1,+1}I^I if and only if 
f{S) = for all S* C ^ \ 0, which proves the theorem. 

E Proof of Theorem [H 



First, 



Pr [/(X) = 1|X^ = xa] = ^ (1 + E[/(X)|Xa = xa]) 



(20) 
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where ( 20 ) follows from an application of Lemma [T] By definition of the conditional entropy, 
//(/(X)|Xa)= Y1 Pr[XA = x^]i/(/(X)|X^ = XA) 

^ Pr[XA = x^]/i(Pr[/(X) = l|X^=XA]) 

XAe{-i,i}i-^l 

^ Pr[X^ = x^]/i(g(xA)) (21) 

XAe{-i,i}i-*l 

= E[h{q{XA))] (22) 

where h{-) is the binary entropy function as defined in ([s]). To obtain (21), we used (20), and the 
expectation in (21) is with respect to the distribution of X^. 
Observe that 



E[4(7(X^)(l-g(XA))] =E 



1 - E /(^)^s(Xa) 

\SCA 

1 - E E fiS)f{U)E[^s{^A)^ui^A)] 

SCAUCA 

1 - E fisf. 



(23) 



SCA 



where (23) follows from the orthogonality of the basis functions. 

We first prove the lower bound in Theorem [4j Applying the lower bound on the binary entropy 
function h{p) > 4p(l —p), given in |24l Thm. 1.2] on (|22[) yields 



F(/(X)|X^) = E[h{qiXA))] > E[4g(X^)(l - (/(X^))], 

and the lower bound in Theorem |4] follows using ( |23[ ). 

Next we prove the upper bound in Theorem [4| Applying the upper bound on the binary entropy 
function h{p) < {p{l — p))^^^"^^\ given in \24:\ Thm. 1.2] on (22) yields 



H{f{X)\XA)=E[h{q{XA)] 

<E[(4g(XA)(l-g(XA)))i/l"(^)]. 



(24) 



The term y in (24) is a random variable, and the function (y)i/''^(^) is concave in Y. An application 
of Jensen's inequality (see e.g. ^19j) yields E[(y)^/^"*^^)] < (E[y])-^/''^(^\ hence the right-hand side 

(25) 



of ( 24 ) can be lower as 



i7(/(X)|XA) < (E[4g(XA)(l - g(X^))])i/^°(^) . 
Finally the upper bound in Theorem [4] follows from combining ( |25[ ) and ( 23 ) . 
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F Proof of Theorem [5] 



Equation (22) from the proof of Theorem |4] in Appendix [E| is 

i7(/(X)|X^)=E[%(XA))]. 



(26) 



Inserting q(X.A) as given by (20) proves the Theorem, 
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